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Abstract. Two formalisms have recently been proposed to perform a non- 
uniform random generation of combinatorial objects based on context-free 
grammars. The former, introduced by Denise et al, associates weights with 
letters, while the latter, recently explored by Weinberg et al in the context of 
random generation, associates weights to transitions. In this short note, we 
use a trivial modification of the Greibach Normal Form transformation algo- 
rithm, due to Blum and Koch, to show the equivalent expressivities of these 
two formalisms. 



1. Introduction 

The random generation of combinatorial objects is one of the natural applications 
of enumerative combinatorics. Following general principles outlined by Wilf |12) . 
Flajolet et al [8] proposed a fully-automated algebraic approach for the extensive 
class of decomposable combinatorial objects, a large class of objects that includes 
context-free languages. This pioneering work was later completed by the introduc- 
tion of Boltzmann samplers, an alternative family of random generation algorithms 
based on analytical properties of the underlying generating functions [7|. However, 
these works only addressed the uniform distribution, while many applications of 
random generation (e.g. in RNA bioinformatics [6]) require non- uniform distribu- 
tions to be modeled. 

To that purpose, Denise et al [4^ introduced (terminal)-weighted grammars, a 
non-uniform framework where the terminal symbols (letters) are associated with 
a real positive value, inherited multiplicatively by words in the language. Such 
weights were then used, through a trivial renormalization, to induce a probability 
distribution on the finite set of words of a given length. Generic random generation 
algorithms were proposed [1] and implemented within a general random genera- 
tion toolbox |10) . Analytic and numerical approaches were proposed for figuring 
out suitable set of weights that would mimic a given, observed, distribution [3J. 
Finally, a multidimensional rejection scheme was explored to sample words of a 
given composition, yielding efficient algorithms by generalizing the principles of 
Boltzmann sampling [2]. 

More recently, Weinberg et al [11] proposed an alternative definition for weighted 
grammars, associating positive real-values to rules instead of terminal letters. The 
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authors proposed a random generation procedure based on formal grammar ma- 
nipulations, followed by a call to an unranking algorithm due to Martinez and Mo- 
linero [9J. However, the relative expressivities, in term of the distribution induced 
by the respective weighting schemes, of the two formalisms were not compared. 

In this short note, we establish the equivalence of the two formalisms with re- 
spect to their induced distributions. After this short introduction, we remind in 
Section [2] the definitions of terminal- weighted and rule- weighted grammars. Then 
we turn to an analysis of the relative expressivities of the two formalisms, and es- 
tablish in Section [3] that any terminal- weighted grammar can be simulated by a 
rule-weighted grammar. Furthermore, we use a Greibach Normal Form transfor- 
mation to prove, in Section |4j that any rule- weighted grammar can be transformed 
into a terminal-weighted grammar inducing the same probability distribution, from 
which one concludes on the equivalence of the two formalisms. We conclude in 
Section [5] with some closing remarks and perspectives. 

2. Definitions 

A context-free grammar is a 4-tuple Q — (E,Af,V,S) where 

• S is the alphabet, i.e. a finite set of terminal symbols, also called letters. 

• Af is a finite set of non-terminal symbols. 

• V is the finite set of production rules of the form A — > X, where A G Af is a 
non-terminal and X G {£ U Af}* is a sequence of letters and non-terminals. 

• S is the axiom of the grammar, i. e. the initial non-terminal. 

We will denote by C{Q) n the set of all words of length n generated by Q. This set 
is generated by iteratively applying production rules to non-terminals until a word 
in S* is obtained. 

Note that the non-terminals on the right-hand side of a production rule can 
be independently derived. It follows that the derivation process, starting from 
the initial axiom and ending with a word w over the terminal alphabet, can be 
represented by a parse tree d w . This (ordered directed) tree associates production 
rules to each internal node and terminal letters to each leaf, such that the z-th child 
of a node labeled with N — > x%. ■ ■ ■ .Xk is either a terminal letter xi G £ or a further 
derivation of xi G V, starting from a root node that derives the axiom S. 

Assumptions: Let us assume, for the sake of simplicity, that the grammars 
considered in the following are unambiguous, i.e. that any word in C(Q) n has 
exactly one associated parse tree. Moreover, let us assume, without loss of general- 
ity, that the grammar is given using a binary variant of the Chomsky Normal 
Form (CNF), which partitions the non-terminals into four classes, restricting their 
production rules to: 

• Axiom: S — s- N , N G Af ', and/or S — > e. 

• Unions: N -> N' | A", such that A, A', A" € Af/{S}. 

• Products: A -> A'.A", such that A, A', A" € Af/{S}. 

• Terminals: A -> t, t G S. 

Finally, we will postulate the absence of non-productive terminals, e.g. having rules 
of the form A — > N.N'. 

2.1. Terminal- Weighted Grammars. A non-uniform distribution can be postu- 
lated on the language generated by the grammar. To that purpose, two formalisms 
have been independently proposed, reminded here for the sake of completeness. 



RULE-WEIGHTED VS TERMINAL-WEIGHTED CONTEXT-FREE GRAMMARS 



3 



Definition 2.1 ((Tcrminal)-Wcighted Grammar [4J. A terminal-weighted gram- 
mar On is a 5-tuple Q v — (ir,T,,Af,P,S) where: 

• (E,J\f,V,S) defines a context-free grammar, 

• 7r : £ — ;> R + is a terminal-weighting function that associates a non-null 
positive real-valued weight ir t to each terminal symbol t. 

The weight of a word w G C(Q W ) is then given by 

7T(w) = Y[ 
tSE 

and extended into a probability distribution over C{Q) n by 

7r(w) 

2.2. Rule- Weighted Grammars. 

Definition 2.2 ((Rule)-Weighted Grammar QT]). A rule-weighted grammar Q\ is 
a 5-tuple g x = (A, T,,N,V, S) where: 

• CE,J\f,V,S) defines a context-free grammar, 

• A : V — > R + is a rule- weighting function that associates a positive non- 
null real-valuecQ weight A r to each derivation r G V, using the notation 
N — > y X to indicate the association of a weight A r = y to a rule r = (N — > 
X). 

The weight function A can then be extended multiplicatively over £(G\) through 



AH = J] A r , Vw G C(Q X ) 



r=(N^ Xr X) 



where d w is the (unique) parse tree of w in Q\. This induces a probability distri- 
bution over C(Q\) n such that 

AH 



PA.nH 



3. Any terminal-weighted distribution can be obtained using a 
rule- weighted grammar 

Theorem 3.1. For any terminal- weighted grammar Q v , there exists a rule-weighted 
grammar Q\, C(Q n ) = C(Q\), inducing an identical probability distribution. 

Proof. We give a constructive proof of this theorem. For any grammar Q„ = 

(it, E,Af,V,S), let us consider the rule- weighted grammar defined by Q\ := (A, Ti,Af,V,S), 

such that X(N — > t) = n t and A(-) = 1 otherwise. 

Clearly, the production rules and axioms of Q\ and Q n are identical, therefore 
one has C{G\) = C{G-k) and, in particular, 

£(&)„ - CdSJn, Vn > 0. 



^Morc precisely, Weinberg et al restrict their formalism to rational weights, based on the 
rationale that real-numbers would lead to unstable computations. However their framework could 
easily be extended to any computable real numbers without loss of precision, e.g. by implementing 
a confidence intervals approach described in Denise and Zimmermann [5], therefore we consider 
a trivial extension of this formalism here. 
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Let us now remark that any terminal letter t in a produced word w results from 
the application of a rule of the form N — > t, and that the parse trees in Q-^ and Q\ 
of any word w £ C(Q n ) — C(Q\) are identical. It follows that the occurrences of 
the terminal letter t in w are in bijection with the occurrences of the N t rule 
in its parse tree d w , and therefore 

Since C{Q\) n = £(Gir)n, then one has 

Y A H = Y n ( w )> 

and we conclude that, for any length n > 0, one has 

\{w) tt(w) 

which proves our claim. □ 

4. Any rule-weighted distribution can be obtained using a 
terminal-weighted grammar 

Theorem 4.1. For any rule-weighted grammar Q\, there exists a terminal- weighted 
grammar Q^, C{Q\) = C{Q^), inducing an identical probability distribution. 

Proof. Let us first remind the definition of the Greibach Normal Form (GNF), 
which requires each production rule to be of the form: 

• S — > e, where S is the axiom, 

• N ->■ t.X, where t £ E and X £ {£ U 

Based on Lemma 14.21 proven below, we know that any rule- weighted grammar 
in Chomsky-Normal Form can be transformed into a GNF grammar that generates 
the same language and induces the same distribution. Let us then assume, without 
loss of generality, that the input grammar Q\ = (A, E, A/", V, S) is in GNF. 

By duplicating the vocabulary, one easily builds a terminal-weighted grammar 
that induces the same probability distribution as Q\. Namely, let us define = 
(it, EttjA/", ■PttjiS) such that E^ := {t r } re -p, ir(t r ) := A(r), and 

V„ := {N -> t r .X | r=(N -t x t.X) £ V} U {S -»• e | S -+ x e £ V}. 

Clearly, each terminal letter in a word produced by Q v can be unambiguously asso- 
ciated with a rule of Q\, therefore the weight of any non-empty word is preserved. 
Furthermore, the generated languages of Q\ and Q v are identical, so the distribution 
is preserved. Finally, the weight of the empty word e, implicitly set to 1 in the new 
grammar, may generally differ from its original value X(S — > e) in Q\. However, e 
is the only word of length 0, and therefore has probability 1 in both grammars. We 
conclude that the probability distribution induced by Q\ is the same as that of Q^. 

Lemma 4.2. For any rule-weighted grammar Q\ = (A, E, A/", V, S), there exists a 
grammar W\i in Greibach Normal Form inducing the same distribution. 

Proof. Again, we use a constructive proof, showing that the weight distribution can 
be preserved during the transformation of the grammar performed by the Blum and 
Koch normalisation algorithm [T]. Let us state the algorithm: 
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(1) Renumber non-terminals in any order, starting with the Axiom S N\. 

(2) For k = 1 to \Af\, consider the non-terminal Nk- 

(a) For each r = (Nk Nj.X) € V, such that j < k and 

Nj X!,Nj -^ 2 X 2 , ■ ■ ■ , Nj -> Xm X m , 
replace r in V as follows 



Former rule(s) 


New rule(s) 




N k 


—tx-X-L X\.X 


N k -> x Nj.X 


N k 






N k 


^x-x m X m .X. 



(b) Fix any left-recursive non-terminal Nk by replacing its rules as follows, 
using an alternative chain-rule construct: 



Former rule(s) 


New rule(s) 


N k 


N k .X x 


N k 
N' k 




Xi-K 
Xt 


Nk -^ Xm 


Nk -X m 


K 
N' k 




X m .N' k 

X m 


Nk -+ V1 


Y x 


N k 




YiN' k 
Yx 


Nk -*y m , 


Y , 

1 m 


K 
N'k 


^y m > 


Y m N' k 



(3) For k = \Af\ down to 1, consider the non-terminal iV/.: 

(a) For each r = (Nk — >x Nj.X) e V, such that j > k and 

Nj —} Xl X\,Nj — *- X2 X2, ■ ■ ■ ,Nj —t Xm X m , 

replace r in V as follows 



Former rule(s) 


New rule(s) 




Nk 


X\.X 


N k ^ x Nj.X 


N k 


-^x-x 2 Xi.X 




N k 


^X-X 7 n X m .X. 



One easily verifies that, after any iteration of step ([2]), the grammar no longer 
contains any rule Nj — > N.X such that I < j < k. This holds for Nk which, 
after the full execution of step ©, does not depend from any non- terminal, and is 
therefore in GNF. Furthermore, one may assume that, anytime a non-terminal Nk 
is considered during step ©, every Nj such that k < j is in GNF. Consequently, 
the expansion of Nj only creates rules that are GNF-compliant, thus N k is in GNF 
at the end of the iteration. 

Let us denote by H\> = (A', S, N~', V ,S) the rule- weighted grammar obtained at 
the end of the execution. One first remarks that both the expansions (Steps (J21a) 
and ([Ha)) and the chain-rule reversal (Steps ((2]b)) preserve the generated language, 
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so that the language generated by a non-terminal in Q\ is also the language gen- 
erated by its corresponding non-terminal in 7ix> . Furthermore, one can prove, by 
induction on the number of derivations required to generate a word, that the in- 
duced probability distribution is kept invariant by rules substitutions operated by 
the algorithm. 

To that purpose, let us first extend the definition of a rule- weighting function to 
include a partial derivation instead of a single non-terminal. Namely, X x (w) will 
represent the weight of w, as derived from X £ {PUT,}* and, in particular, one has 
As = A. Let us now consider the rule- weighting functions X* and A^, respectively 
induced by the grammar before and after a modification: 

• Induction hypothesis: Any word w generated from any X{VU £}* using d 
derivations, 1 < d < n, is such that X' x (w) = X x (w). 

• Rule expansion (Steps (M a ) an d (M a )) : Consider a word w, generated using 
n derivations from some non-terminal N. Clearly, if N ^ Nk or if the first 
derivation used is N — > X' ^ Nj.X, then the rule used to generated w is 
not affected by the modification. The induction hypothesis applies and one 
trivially gets X% k (w) = X^ k (w). 

Consider the initial state of the grammar. When w results from a deriva- 
tion N — > x Nj.X, then there exists a (unique) decomposition w — w'.w", 
where w' is produced by the application of some rule Nj — Xi, and 
w" is derived from X. The weight of w is then given by Aj^ (id) = 
x ■ Xi ■ X x .(w') ■ X x {w"). 

In the modified version of the grammar, w = w'.w" unambiguous derives 
from an application of the new rule -^ x -xi Xi.X (w' S C(Xi) and 
w" e C(X)), with associated weight X*(w) = x ■ Xi ■ X'l(w') ■ X*(w"), 
Since both w' and w" are generated using less than n derivations, then the 
induction hypothesis applies and one gets 



X+ k (w) =x- Xl - X x .(w') ■ X x (w") =x-Xi- X x .(w') ■ X x (w") = X% k (w). 



• Chain-rule reversal (Steps (Mb)): Any word produced using the initial left- 
recursive chain-rule can be uniquely decomposed as w = w'.w". ■ ■ ■ .w'p, 
where w' is generated from some Nk — > y . Y q ,q G [1, m'], and each w" is 
generated by some rule Nk —> Xg . Nk-X qil qi £ [1, m]. Its weight is therefore 

given by A^ fc (w) = y q ■ ^ ) ' K q K) • (llT=i X x n K)) ■ 

After chain-rule reversal, the same decomposition w = w .w'{. ■ ■ ■ .w p 
holds, but the sequence of derivation is now either Nk ~^y q Y q (w = w'), or 

N k ^ yq Y q .N' k ^ Xqi Y q .X qi .N' k 
~* Y q .X qi . ■ ■ ■ .X qm _ 1 .N' k 

V V V I II II 

-+x gm Y q .X qi .X qm ~* W .W 1 . ■ ■ ■ .W p . 
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In both cases, the induction hypothesis applies for each element of the 
decomposition, and the weight of w in the new decomposition is given by 



\i=l / \i=l / 

It follows that the weight of any word is left unchanged by the substitutions per- 
formed in the algorithm. Since the generated language is also preserved, then such 
a preservation of the weights implies a preservation of the probabilities. We con- 
clude that the returned grammar, in addition to being in GNF, also induces the 
same probability distribution as Q\. □ 

□ 



5. Conclusion 

Using a trivial modification of the Blum and Koch algorithm [1] , we showed that 
weighting terminal or weighting rules have equal expressive power, i.e. that any 
distribution captured by the former formalism is also captured by the other and 
vice- versa. 

While both proofs are relatively trivial, going from rule-weighted grammars to 
terminal-weighted grammars turned out to be more involved than the alternative, 
leading to an increase of the number of rules. However, this observation might 
be deceptive, as the choice of the Greibach Normal Form as an intermediate form 
is only one out of possibly many alternatives, and one could devise more efficient 
grammar transforms capturing the same distributions. Moreover, it is noteworthy 
that, even if one chooses to use GNF grammars, there still seems to be a gap 
between the 0{\P\ A ) size of the grammar returned by the Blum and Koch algorithm, 
and the minimum OdT^I 2 ) increase observed for some infinite family of grammars, 
motivating the search for better GNF transformation algorithms. 
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